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Algorithms for Scheduling Malleable Cloud 

Tasks 

Xiaohu Wu and Patrick Loiseau 


Abstract —Due to the ubiquity of batch data processing in cloud computing, the related problem of scheduling malleable batch tasks 
and its extensions have received significant attention recently. In this paper, we consider a fundamental model where a set of n tasks is 
to be processed on C identical machines and each task is specified by a value, a workload, a deadline and a parallelism bound. Within 
the parallelism bound, the number of machines assigned to a task can vary over time without affecting its workload. For this model, we 
obtain two core results: a sufficient and necessary condition such that a set of tasks can be finished by their deadlines on C machines, 
and an algorithm to produce such a schedule. These core results provide a conceptual tool and an optimal scheduling algorithm that 
enable proposing new algorithmic analysis and design and improving existing algorithms under various objectives. 
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1 Introduction 

C LOUD computing has become the norm for a wide 
range of applications and batch processing constitutes 
the most significant computing paradigm []lj. In particular, 
many applications such as web search index update, monte 
carlo simulations and big-data analytics require the execu¬ 
tion on computing clusters of a new type of parallel tasks, 
termed as malleable tasks. Two basic features of malleable 
tasks are workload and parallelism bound; during the exe¬ 
cution, the number of machines assigned to a task can vary 
over time within the parallelism bound but its workload is 
not changed and affected by the number of used machines 

0 0 

In the scheduling theory, the above task model can be 
viewed as an extension of the classic model of scheduling 
preemptive tasks on a single or multiple machines where 
the parallelism bound is one |4j, 0. Beyond understanding 
how to schedule this fundamental task model, many efforts 
are also devoted to its online version 04 0^ ® and its 
extension in which each task contains several subtasks with 
precedence constraints 0, [10|- In practice, for better effi¬ 
ciency, companies such as IBM has integrated the smarter 
scheduling algorithms for various time metrics (than the 
popular dominant resource fairness strategy) into their 
batch processing platforms for malleable tasks jl0|, |lT). 

In this paper, our goal is to provide a relatively thor¬ 
ough understanding of the fundament task model in |2j, 
131, producing scheduling algorithms for various objectives. 
Results from the special single machine case have already 
implied that the problem of optimally scheduling tasks with 
deadlines on machines would play a key role in achieving 
our goal 0, |l21. In particular, the famous EDF (Earliest 
Deadline First) rule can achieve an optimal schedule for the 
single machine case. It is initially designed so as to find an 
exact algorithm for scheduling batch tasks to minimize the 
maximum task lateness (i.e., task's completion time minus 
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due date). So far, many applications of this rule have been 
found (i) to design exact algorithms for the extended model 
with release times and for scheduling with deadlines (and 
release times) to minimize the total weight of late tasks, and 
(ii) as a significant principle in schedulability analysis for 
real-time systems. 

In the following, we introduce the related works to fur¬ 
ther understand what are deriving our thinking to propose 
a series of new techniques to achieve the goal of this paper. 

1.1 Related works 

The linear programming approaches to designing and 
analyzing algorithms for the task model of this paper |2j, 
131 and its variants 0, 0, |9) have been well studiecQ All 
these works consider the same objective of maximizing the 
social welfare i.e., the sum of values of tasks completed by 
their deadlines. In 0, Jain et al. propose an algorithm with 
an approximation ratio of (1 + c _ fc ) (1 + e) via deterministic 
rounding of linear programming. Subsequently, Jain et al. 0 
propose a greedy algorithm GreedyRTL and use the dual- 
fitting technique to derive an approximation ratio l =^ • S “ 1 . 
Flere, k is the maximum parallelism bound of tasks, and s 
(> 1) is the slackness which intuitively characterizes the 
resource allocation urgency (e.g., s = 1 means that the 
maximum amount of machines have to be allocated to a task 
at every time slot to meet its deadline). In practice, the tasks 
tend to be recurring, e.g., they are scheduled periodically 
on an hourly or daily basis 0. Hence, we can assume that 
the maximum deadline of tasks is finitely bounded by a 
constant d. In addition, the parallelism bound k is usually 
a system parameter and is also finite [HI. In this sense, the 
GreedyRTL algorithm has a polynomial time complexity of 
0(n 2 ). 

In (9|, Bodik et al. consider an extension of our task 
model, i.e., DAG-structured malleable tasks, and, based 
on randomized rounding of linear programming, they pro¬ 
pose an algorithm with an expected approximation ra¬ 
tio of ct(A) for every A > 0, where ct(A) = j ■ e - x . 

1. We refer readers to 151, [ 131 for more details on the general 
techniques to design scheduling algorithms. 
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(1 —1/A)C —fc 1 \ /i k \"| 

1 — e 2 a;« in a-^± c ) _ Th e online version of our task 
model is considered in &0 ; again based on the dual-fitting 
technique, two weighted greedy algorithms are proposed 
respectively for non-committed and committed scheduling 
and achieve the competitive ratios of cr^ = 2 + 0( ^ 3 ^ 1 _ 1 ) 2 ) 

where s > 1 0 and where w £ (0,1) and 

S > ■ 

All the works 0 , 0 , 0 , 0 ,@ formulate their problem 
as an integer program (IP) and relax the IP to a relaxed linear 
program (LP). The techniques in 0 0 require to solve the 
LP to obtain a fractional optimal solution and then manage 
to round the fractional solution to an integer solution of 
the IP that corresponds to an approximate solution to their 
original problem. In N, 0 , the dual fitting technique 
first finds the dual of the LP and then construct a feasible 
algorithmic solution X to the dual in some greedy way. This 
solution corresponds to a feasible solution Y to their original 
problems, and, due to the weak duality, the value of the dual 
under the solution X (expressed in the form of the value 
under Y multiplied by a parameter a > 1) will be an upper 
bound of the optimal value of the IP, i.e., the optimal value 
that can be achieved in the original problem. Therefore, the 
approximation ratio of the algorithm involved in the dual 
becomes clearly 1/a. Here, the approximation ratio is a 
lower bound of the ratio of the actual value obtained by 
the algorithm to the optimal value. 

Using these techniques based on LP, it is difficult for 
us to understand how to design more efficient or other 
types of algorithms to schedule malleable tasks. Indeed, the 
algorithmic design in 0 has to rely on the LP formulation. 
However, for the greedy algorithm in 0, we can seek 
a different angle than the dual fitting technique to finely 
understand a basic question: what resource allocation features 
of tasks can benefit the performance of a greedy algorithm ? This 
question is related to the scheduling objective. Further, we 
will prove that answering the secondary question "how could 
we achieve an optimal schedide so that C machines are optimally 
utilized by a set of malleable tasks with deadlines in terms of re¬ 
source utilization?" plays a core role in (i) understanding the 
above basic question posed in 0 , (ii) applying the dynamic 
programming technique to the problem in 0 > and (iii) 
designing algorithms for other objectives. Intuitively, for any 
schediding objective, an algorithm woidd be non-optimal if the 
machines are not optimally utilized, and its performance can 
be improved by optimally utilizing the machines to allow 
more tasks to be completed. 

In addition, Nagarajan et al. jTo) consider DAG- 
structured malleable tasks and propose two algorithms with 
approximation ratios of 6 and 2 respectively for the objec¬ 
tives of minimizing the total weighted completion time and 
the maximum weighted lateness of tasks. The conclusions 
in [101 also show that scheduling deadline-sensitive mal¬ 
leable tasks is a key to the solutions to scheduling for their 
objectives. In particular, seeking a schedule for DAG tasks 
can be transformed into seeking a schedule for tasks with 
simpler chain-precedence constraints; then whenever there 
is a feasible schedule to complete a set of tasks by their 
deadlines, Nagarajan et al. propose an algorithm where each 
task is completed by at most 2 times its deadline and give 
two procedures to obtain the near-optimal completion times 


of tasks in terms of the two scheduling objectives. 

1.2 Contributions 

In this paper, we propose a new conceptual framework 
to address the related scheduling problems when malleable 
batch tasks are considered. As discussed in GreedyRTL, we 
assume that the maximum deadline to complete a task and 
the maximum parallelism bound of tasks can be finitely 
bounded by constants. The results of this paper are sum¬ 
marized as follows. 

Core result. The core result of this paper is the first optimal 
scheduling algorithm so that C machines are optimally 
utilized by a set of malleable batch tasks S with deadlines 
in terms of resource utilization. 

We first understand the basic constraints of malleable 
tasks and identify the optimal state in which C machines 
can be said to be optimally utilized by a set of tasks. Then, 
we propose a scheduling algorithm LDF(<S) (Latest Dead¬ 
line First) that achieves such an optimal state. The LDF(<S) 
algorithm has a polynomial time complexity of 0(n 2 ) and 
is different from the EDF algorithm that gives an optimal 
schedule in the single-machine case. 

Applications. The above core results have several appli¬ 
cations in proposing new or improved algorithmic design 
and analysis for scheduling malleable tasks with various 
objectives. In particular, we provide: 

(i) an improved greedy algorithm GreedyRLM with an 
approximation ratio —- for the social welfare maxi¬ 
mization objective with a time complexity of 0(n 2 ); 

(ii) the first exact dynamic programming (DP) algo¬ 
rithm for the social welfare maximization objec¬ 
tive with a pseudo-polynomial time complexity of 
0(max{n 2 , nd L C L }); 

(iii) the first exact algorithm for the machine minimiza¬ 
tion objective with a time complexity of 0(n 2 ); 

(iv) an improved algorithm for the objective of minimiz¬ 
ing the maximum weighted lateness, reducing the 
previous approximation ratio by a factor 2 . 

Here, L, D, and d are the number of deadlines and the 
maximum workload and deadline of tasks. 

We further make some notes for the first and second 
algorithms above. 

In this paper, we also prove that ^=1 is the best approxi¬ 
mation ratio that a general greedy algorithm can achieve. 
Although GreedyRLM only improves Greedy RTL in [3j 
marginally when C k, theoretically it is opitmal. The 
second algorithm can work efficiently only when L is small 
since its time complexity is exponential in L. However, 
this may be reasonable in a machine scheduling context. 
In scenarios like 0, (15) , tasks are often scheduled peri¬ 
odically, e.g., on an hourly or daily basis, and many tasks 
have a relatively soft deadline (e.g., finishing after four 
hours instead of three will not trigger a financial penalty). 
Then, the scheduler can negotiate with the tasks and select 
an appropriate set of deadlines {ti,T 2 , • ■ ■ , 77 ,}, thereafter 
rounding the deadline of a task down to the closest r, 
(1 < i < L). By reducing L, this could permit to use the 
DP algorithm rather than GreedyRLM in the case where the 
slackness s is close to 1. With s close to 1, the approximation 
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ratio of GreedyRLM approaches 0 and possibly little social 
welfare is obtained by adopting GreedyRLM while the DP 
algorithm can still obtain the almost optimal social welfare. 

Finally, the second algorithm can be viewed as an ex¬ 
tension of the pseudo-polynomial time exact algorithm in 
the single machine case (4) that is also designed via the 
generic dynamic programming procedure. However, before 
our work, how to enable this extension was an open prob¬ 
lem as stated in 0' 0 We will show that this is mainly 
due to the conceptual lack of the optimal state of machines 
being utilized by malleable tasks with deadlines and the 
lack of an algorithm that achieves such state. In contrast, 
the optimal resource utilization state in the single machine 
case can be defined much more easily and be achieved by 
the existing EDF algorithm. The core result of this paper fills 
the above gap and is the enabler of a DP algorithm. The way 
of applying the core result to design a greedy algorithm is 
less obvious since in the single machine case there is no 
corresponding algorithm to hint its role in the design of 
a greedy algorithm. So, new insights (into the above basic 
question) are required to enable this application and will be 
obtained through a complex new algorithmic analysis that 
does not rely on the dual-fitting technique in |3|. 

The remainder of this paper is organized as follows. 
In Section [2] we introduce the machine and task model 
and the scheduling objectives considered in this paper. In 
Section [3] we identify the optimal resource utilization state 
and propose a scheduling algorithm that achieves such a 
state. In Section [4] we show four applications of the results 
in Section [3] to different algorithmic design techniques and 
scheduling objectives. Finally, we conclude the paper in 
Section HI 

2 Model 

There are C identical machines and a set of n tasks 
T = {Ti,T 2 , • • • . T n }. The task 7) is specified by several 
characteristics: (1) value v, u (2) demand (or workload) Di, (3) 
deadline di, and (4) parallelism bound ki. Time is discrete and 
the time horizon is divided into d time slots: {1, 2, • • • , d}, 
where d = max^gri and the length of each slot may 
be a fixed number of minutes. A task T) can only utilize 
the machines located in time slot interval [1, rf*]. The par¬ 
allelism bound ki limits that, at any time slot t, can 
be executed on at most ki machines simultaneously. Let 
k = maxj\ eT^i be the maximum parallelism bound; ki 
is usually a system parameter and so k is assumed to 
be finite (U- The allocation of machines to a task T* is 
a function y,; : [l,d;] —> {0,1,2, ••• , fe*}, where y,(f) is 
the number of machines allocated to task T, at a time slot 
t £ [1, di]- So, the model here also implied that Di,di £ Z + 
for all Ti £ T. For the system of C machines, denote by 
W(t) = Y^i —1 Ui{t) the workload of the system at time slot 
t; and by W(t) = C — W(t) its complementary, i.e., the 
amount of available machines at time t. We call time t to be 
fidly utilized (resp. saturated) if W(t) =0 (resp. W{t) < k), 
and to be not fidly utilized (resp. unsaturated) otherwise, i.e., 
if W(t) > 0 (resp. W(t) > k). In addition, we assume that 
the maximum deadline of tasks is bounded. 

Given the model above, the following three scheduling 
objectives will be addressed separately in this paper. 


TABLE 1 
Main Notation 


Notation 

Explanation 

C 

the number of machines 

T 

a set of tasks to be processed 

Ti 

a task 

Di , di j Vi 

the workload, deadline, and value of T 

ki 

the parallelism bound of Ti, i.e., the maximum 
number of machines that can be allocated to and 
utilized by Ti simultaneously 

Vi(t) 

the number of machines allocated to T at a time 
slot, yi(t) S (0,1, ■ ■ ■ , ki} 

W(t) 

the number of machines that are allocated out 
and utilized by the tasks at t, i.e., W(t) = 
52 tv er Vi(f) 

W(t) 

C — W (t), i.e., the number of remaining machines 
available at t 

lerii 

the minimum execution time of T ? ; where Ti is 
allocated ki machines in the entire execution pro¬ 
cess, i.e., lerii = (xfl 

Si 

the slackness of a task, i.e., j—measuring the 
urgency of machine allocation to complete Ti by 
the deadline 

s 

miny. 6 7 - Si, i.e., the minimum slackness of T 

d, D 

the maximum deadline and workload of T, i.e., 
minx-. eT di an d rninx- eT Di 

< 

the marginal value of Ti, i.e., v] = jf- 

{ti,-- - ,tl} 

the set of all the deadlines di of Ti E T, where 

0 = to < ti < • • • < tl = d 

T>i 

all the tasks {Ti,i, Tg 2 , ■ • • , Tgn;} in T that have 
a deadline Ti, 1 < i < L 


The first objective is to choose an a subset S C T and 
produce a feasible schedule for S so as to maximize the 
social welfare 'ff T eS v '- (' e -, the sum of values of tasks 
completed by deadlines); here, the value Vi of a task Ti can 
be obtained only if it is fully allocated by the deadline, i.e., 
J2t<d -Did) — Di, and partial execution of a task yields 
no value. The second objective is to minimize the number of 
machines C so that there exists a feasible schedule of T on 
C machines. In the two objectives above, a feasible schedule 
means: (i) every task is fully allocated by its deadline and the 
parallelism constraint is not violated, and (ii) at every time 
slot t the number of used machines is no more than C, i.e., 
W (t) < C. The third objective is to minimize the maximum 
weighted lateness of tasks, i.e., minyveT'lw ’ (U~ di)}, where 
ti is the completion time of a task 7). 

We now introduce more concepts to facilitate the algo¬ 
rithm analysis of this paper. We will denote by [l] and [l] + 
the sets {0,1, • • • , 1} and {1, 2, • • • , l}. Let lent = \Di/kf\ 
denote the minimal length of execution time of Ti. Given a set 
of tasks T, the deadlines di of all tasks Ti £T constitute a fi¬ 
nite set {n, T 2 , • • • , Tif}, where L < n, ri, • • • , tl £ Z + , and 
0 = To < Ti < • • • < t l = d. Let T>i = {T u ,T ii2) • ■ • ,T i>ni } 
denote the set of tasks with deadline Ti, where Ylt= 1 = n 
(i £ [L] + ). Denote by s, = the slackness of T), 
measuring the urgency of machine allocation (e.g., s, = I 
may mean that Ti should be allocated the maximum amount 
of machines ki at every t £ [1, di]) and let s = mbiT-eT Sj 
be the slackness of the least flexible task (s > 1). Denote by 
v' = ff the marginal value, i.e., the value obtained by the 
system through executing per unit of demand of the task Ti. 
We assume that the demand of each task is an integer. Let 
D = max 7 -\ e 7 -{Z)j} be the demand of the largest task. 
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No. of machines 
4 

k: 


len. 


No. of machines leilj 

A i 


Algorithm 1: LDF(6>) 


ki 


dj T L 


dj t l 


Fig. 1. The blue areas denote the maximum demand of Tj that need or 
could be processed in [r m + 1 ,t l \; the deadline and parallelism con¬ 
straints limit that Tj can only utilize at most kj machines simultaneously 


Output: A feasible allocation of machines to a set of 
tasks S 

1 for m 4— L to 1 do 

2 while S m / 0 do 

3 Get Ti from S rn 

4 Allocate-B(i) 

s S m 4— S m — {Ti} 




- 

> 


G reedyRLM 

Allocate-A(i) 

Fully-Utilize(i) - AllocateRLM(i) 

iiiLna.iii.ifel 







Allocate-B(i) 



A 2 (5) 

Ai(i) 


4(S) 

4(S) 

LDF|5) 

_ * _ 

AllocateRLM(i) 



Fig. 2. The colored areas in [r 2 _ m + 1,T2] in the left and right figures 
denote A m (S) and A^(S), where m = 1, 2, and L = 2. 


Fig. 3. Relations among algorithms: for A -r- B, the blue and green 
arrows denote the relations that A calls B and B is executed upon 
completion of A. 


The main notation of this paper is also listed in Table [l] 


3 Optimal Schedule 

In this section, we identify a sufficient condition for C 
machines to be optimally utilized by a set of tasks T, i.e., the 
maximum amount of workload of T that could be executed 
in a fixed time interval on C machines; in the meantime, 
we propose a scheduling algorithm that achieves such an 
optimal state. 

3.1 Optimal Resource Utilization State 

Let S C T, and Si = S fl Vi (i £ [L] + ). 

Definition 1. Assume that the number of machines is in¬ 
finite, i.e., C —f oo, and, without the capacity constraint, 
we propose a procedure to define the maximum amount 
of resource A m (<S) that can be utilized by S in every time 
interval [- T L - m + 1, r L \, m £ [L]+: 

• initially, set Ao(<S) to zero. 

• for every task Tj £ S, as illustrated in Figure [l] the 
following computation is executed: 

— if IcTlj £ dj 7~L—m/ Am(tS) 4 A m (tS) ~F Dj, 

- otherwise, X m {S) 4- X m {S) + kj-{d j -T L _ rn ). 

Definition 2. Now, we assume that the total number of 
machines C is finite and, given the capacity constraint, we 
define the maximum amount of resource \{{(S) that can 
be utilized by S on C machines in every + 1 ,tl], 

to £ [L}+: 

• Xq (S) 4— 0; 

• set A^ (5) to the sum of A^_ 3^(5) and 

min { A m (<S) - C ■ (r L _ m+ i - r L _ m )}. 

Definition 2 is illustrated in Figure I where Af(S) = 
C ■ (72 — n) since Ai(<S) > C ■ (r 2 — n), and A^ (S) = A 2 (<S). 


Lemma 3.1. A^(iS) is the maximum (optimal) workload of S 
that could be processed in the time slot interval [tl-tu + 1, rf] on 
C machines, m £ [ L] + . 


Proof. By induction. When m = 0, the lemma holds trivially. 
Assume that this lemma holds when m = l. If A/ + i(<S) — 
Ap(5) < C\t L -i — it means that with the capacity 

constraint in i + 1, d\, S can still utilize XfS) resource 

in [tl_i + l,d] and this lemma holds; otherwise, after S 
has utilized the maximum amount of Xf (S) resources in 
[tl-i + 1, d], it can only utilize C(tl~i — tl-i~ i) resources 
in [tl-i-i, The lemma holds when m = l + 1. □ 


According to Lemma 


3.1 


for to £ [L] + , if S utilizes 
1, d] on C machines, an 


X^(S) resources in every [tl- 
optimal state is achieved in which C machines are optimally uti¬ 
lized by the tasks. Let p^{S) = es D i - Ai_ m (S) denote 
the remaining workload of S that needs to be processed 
after S has optimally utilized C machines in [r m + 1, ry,]. 


Lemma 3.2. A necessary condition/or the existence of a feasible 
schedule for a set of malleable batch tasks with deadlines S is the 
following: 

p^iS) < C ■ r m , for all to £ [L\. 


Proof. By the implication of the parameter A^(£>) in 
Lemma 3.1 after S has optimally utilized the machines in 
[r m + 1. d\, if there exists a feasible schedule for S, the total 
amount of the remaining demands in S should be no less 
than the capacity C ■ T m in [1, r m ]. □ 


We refer to the necessary condition in Lemma 3.2 
boundary condition. 


as 


3.2 Scheduling Algorithm 

In this section, we propose the optimal scheduling algo¬ 
rithm LDF(6>), presented as Algorithm [lj its optimality is 
proved in the next subsection. 

The algorithm LDF(6>) considers the tasks in the non¬ 
increasing order of the deadlines. For every task Tj being 
considered, the resource allocation algorithm Allocate-B(z) is 
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Algorithm 2 : Allocate-B(i) 

1 Fully-Utilized) 

2 Fully-Allocated) 

3 AllocateRLMd, 1) 


Algorithm 4 : Fully-Utilized) 


1 for t 4— di to 1 do 

2 |_ Vift) A- min{fe n Di - Yr t L t +i Vi(ft),W{t)} 


Algorithm 3 : Allocate-A(z) 

1 Fully-Utilized) 

2 AllocateRLMd, 0 ) 


called, presented as Algorithm[2] to allocate I), resource to a 
task Tj without violating the constraints from deadline and 
parallelism bound. Upon every completion of Allocate-B(-), 
it achieves a special optimal resource utilization state such 
that the currently fully allocated tasks S' C S L U • • • U S m 
will have been allocated Ap (S’) resource on C machines in 
[tl-i + l , d ], l € [L\ + . Such state also ensures that when 
the next task Tj is considered, Allocate-B(j) is able to fully 
allocate Dj resource to it iff S' U {Tj} satisfies the boundary 
condition. If so, LDF(6>) can produce a feasible schedule only 
if S satisfies the boundary condition. 

To realize the function of Allocate-B(i) above, the coop¬ 
eration among three algorithms are needed: Fully-Utilized), 
Fully-Allocated), and AllocateRLMd, 771), respectively pre¬ 
sented as Algorithm |4] Algorithm [6] and Algorithm [7] Now, 
we introduce their executing process and show Allocate-B(i) 
can run correctly. 

Notes. The call relations among algorithms are illustrated in 
Figure [3| the algorithm GreedyRLM will be introduced in 
the next section. Since Fully-Utilized) and AllocateRLMd, 
771) also constitute Allocate-A(i), presented as Algorithm [3} 
that is further called by GreedyRLM, we will also show the 
correctness of Allocate-Ad) when introducing them to avoid 
explaining the same things repeatedly. 

Fully-Utilized). Ful ly-LJti 1 ize(z) aims to ensure a task T) 
to fully utilize the current available machines at the 
time slots closest to its deadline, where min{fcj,Z)j — 
Ytt +1 W"(t)} denotes the maximum amount of ma¬ 
chines it can or need to utilize at t with the parallelism 
constraint. 

Lemma 3.3. Upon completion of Fully-Utilized), if Wft) > 0, 
it will allocate min{fcj, Di — Yht=t+ 1 Vift)} machines to T at t; 
further, if Di - JfjLt 2/i(*) > 0/ we ,mve Vift) = &»• 

Proof. By contradiction. If Ti is not allocated min{fcj,Tj 
— Y^jLt.+i Vift)} machines at t with W(t) > 0, it should 
have utilized some more min{min{fcj, Di — Y^jLt+i Vift)} ~ 
yi(t), W (f)} machines when being allocated at t. Further, if 
we also have Di — ffff t Vi ft) > 0, and Ti is not allocated 
ki machines at t, i) should have been allocated some more 
min{A - YftiLt Vi ft)’ W(t),k t - Pift)} machines at t. □ 

When a task Ti has been allocated by Fully-Utilized), 
we cannot guarantee that it is fully allocated D.j resources. 
Hence, more algorithms are needed: Fully-Allocated) 
and AllocateRLlVld, 771). We first introduce an algorithm 
Routine) A, 771, 772), presented as Algorithm [5] that will be 
called in these two algorithms. 


Algorithm 5 : Routine(A, 771,772) 


1 while W(t) < A do 

2 t' <r- the time slot earlier than and closest to t so 
that W{t r ) > 0 

3 if 77! = 1 then 

4 if there exists no such t' then 

5 p break 


6 

7 

8 

9 

10 

11 

12 


else 

if t' < tff}, or there exists no such t' then 
P break 

if 772 = 1 A YftiZi Vi ft') < w ft) then 

P break 

let i' be a task such that y,/ (t) > (t r ) 

Vi'ft) v- yi'(t) - 1, yi>(t') <— Vi’ft') + 1 


Routine(A, 771, 772). Routine(-) focuses on the resource al¬ 
location at a single time slot t, and aims to increase the 
number of available machines W ft) at t to A by transferring 
the allocation of other tasks to an earlier time slot. Here, 
parameters 771 and 772 decide the exit condition of the loop in 
Routine(-). The operation at line 12 of Algorithm [5] does not 
change the total allocation to I f, and violate the parallelism 
bound kt’ of T,/ since the current yi'ft!) is no more than the 
initial ft). The existence of if will be explained when we 
introduce Fully-Allocated) and AllocateRLM(i, 771), where 
771 and 772 have different values. 

Fully-Allocated). Fully-Allocated) ensures that i) is fully 
allocated. Upon completion of Fully-Utilized), we use fl = 
Di — Yfttfi Vi ft) t° denote the partial demand of T) that 
remains to be allocated more resources for full completion 
of Ti. For every time slot t, Fully-Allocated) checks whether 
or not f! > 0 and Tj can be allocated more machines at this 
time slot, namely, ki — y t ft) > 0; then, if A > 0 it attempts 
to make the number of available machines at t become A 
by calling Routine(A, 1, 0). Subsequently, it updates H to 
be H — W ft), and allocates the current available machines 
W ft) at t to Tj. Upon completion of each loop iteration at t, 
Wft) = 0 if it has increased the allocation of Tj at t; in this 
case we will also see that Wft) = 0 just before the execution 
of this loop iteration at t. 

We now explain the reason for the existence of Tj/ in line 
11 of Routined). 

Lemma 3.4. Fidly-Allocate(i) will never decrease the allocation 
yi ft) ofTi at every time slot t done by Fully-Utilized); If W ft) > 
0 upon its completion, we also have that W ft) > 0 just before the 
execution of every loop iteration of Fully-Allocated) ■ 

Proof. The only operations of changing the allocation of 
tasks occur in line 5 of Fully-Allocated) and line 12 of 
Routined)- In those processes, there is no operation that will 
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Algorithm 6: Fully-Allocated) 

1 t 4— di, FI <— Di — Et<d; yi(t) 

2 while fl > 0 do 

3 A <(— minjfci — yi(t), SI} 

4 Routine(A, 1, 0) 

5 Q 17 — W (t), Hi (t) <r~ Hi (t) + W (t) 

6 t 4 — t — 1 


decrease the allocation of T). In line 12 of Routine(-), the 
allocation to T# at t! will be increased and the allocation to 
at t is reduced. W(t') is increased and W(t) is reduced. 
However, in line 5 of Fully-Allocated), W(t) will becomes 
zero and W ( t ) = C. Hence, the allocation W ( t ) at t is also 
not decreased upon completion of a loop iteration of Fully- 
Allocated). The lemma holds. □ 


According to Lemmas 3.3 and 3.4 we make the following 
observation. At the beginning of every loop iteration of 
Fully-Allocated), if A > 0, we have that W(t) = 0 since 
the current allocation of T) at t is still the one done by 
Fully-Utilized) and fl > 0; otherwise, it should have been 
allocated some more machines at t. If there exists a t' 
such that W(t') > 0 in the loop of Routine(-), since the 
allocation of T) at t' now is still the one done by Fully- 
Utilized) and H > 0, we can know that yi(t') = k,. Then, 
we have that W(t) — yt(t ) > W[t') — yi(t') and there 
exists a task TV such that yi'{t') < yi>(t ); otherwise, we 
will not have that inequality. In the subsequent execution 
of the loop of Routine(-)/ W ( t ) becomes greater than 0 but 
W(t) < A < ki — Vi{t). We still have W(t) — yi(t ) = 
C —W{t) — yi{t) > W(t') — ki = W(t') — yi(t') and such 


Tii can still be found. 

Let ui denote the last time slot in which Fully-Allocated) 
will increase the allocation of T,. In other words, the allo¬ 
cation of Ti at every time slot in [l,w — 1] is still the one 
achieved by Fully-Utilized). 


Lemma 3.5. Upon completion of Fully-Allocated), j/» (t) = ki 
for all t £ [w + 1, di]. 


Proof. Suppose there exists a time slot t £ [w+1, df\ at which 
the allocation of I) is less than ki. By the definition of u>, we 
have that FI = Di — YZjLi Vift) > 0 and there also exists 
a time slot t! £ [l,u; — 1] that is not fully utilized upon 
the completion of the loop iteration of Fully-Allocated) at t. 
This contradicts with the stop condition of Routine(-). The 
lemma holds. □ 


AllocateRLMd, r/i). Without changing the total allocation 
of Ti in [1, df\, AllocateRLMd, pi) takes the responsibility to 
make the time slots closest to di fully utilized by Ti and the 
other fully allocated tasks with the constraint of parallelism 
bound, namely, the Right time slots being Loaded Most. 

To that end, AllocateRLM(z, r/i) considers each time slot 
t from the deadline of Ti towards the earlier ones. For the 
current t being considered and given the current allocation 
yi(t), AllocateRLM(z, ?q) will transfer the earlier allocations 
of Ti to t and A = min{fc,; — yi(t), Vift)} denotes the 
maximum extra machines that Ti can utilize here. If A > 
0, we enter Routine(A, r/i, 772 )- Here, A > 0 also means 


Algorithm 7: AllocateRLM(i, 771 ) 

1 t i — di 

2 while Ui(t) > 0 do 

3 

A «- min-ffc., - y t (t), Jfj- 1 2/»(*)} 

4 

Routine(A, pi, 1) 

5 

8 £- W(t), yt{t) 4— yi(t) + W(t) 

6 

let t" be such a time slot that 21 - 7 1 S/» (^) < 0 and 

EUvS) > 6 

7 

0 - eLi 1 y&)> ydt") <- Vi{t") - 0 

8 

for t <— 1 to t" — 1 do 

9 

|_ Vi(J) 0 

10 

t i — t — 1 


yi(t) < ki. When Routine(-) stops, we have that the number 
of available machines W(t) at t is no more than A. Let ut be 
the last time slot t! considered in the loop of Routine(-) for 
t such that the total allocation at t has been increased. In 
a different case than the current state here, AllocateRLM(-) 
does nothing and take no effect on the allocation of T, at t; 
then set ut = ut+i if t < di and ut = di if t = di. Then, 
AllocateRLML, p 1 ) decreases the current allocation of Ti at 
the earliest time slots in [1, u t — 1] to 0 (and by W(t)), and 
accordingly increases the allocation of T, at t by W (t ). Here, 
upon completion of the loop iteration of AllocateRLM(-) at 
t, W{t) = 0 if AllocateRLM(-) has taken an effect on the 
allocation of Ti at f; in this case W (t) also equals to 0 just 
before the execution of this loop iteration at t. 

Now, we begin to show the existence of 2V in line 11 of 
Routine(-)- We first give some lemmas to help us identify 
the resource allocation state and define a parameter. Upon 
completion of the loop iteration in AllocateRLM(-) at t, if the 
allocation of T.-, at t has ever been changed, let v t denote the 
time slot t” in line 6 of AllocateRLM(-), where Vi (^) = 

0 and v t is the farthest time slot in which the allocation of 1) 
is decreased. If yi(t) is not changed by AllocateRLM(-), set 
Vt = v t+ i if t < di and v t = 1 if t = di. 

Lemma 3.6. Upon completion of the loop iteration of 
AllocateRLM(-) at t, the allocation at every time slot in [vt +1, d] 
is never decreased since the execution of AllocateRLM(-). 

Proof. The only operations of decreasing the allocation of 
tasks occur in lines 6-10 of AllocateRLM(-) and line 12 of 
Routine(-). The operations of AllocateRLM(-) does not affect 
the allocation in [u t + 1, d] by the definition of vt- Although 
the allocation at t is decreased by Routine(-), it will finally 
become C in line 5 of AllocateRLM(-). □ 

Lemma 3.7. While the loop iteration in AllocateRLM(-) for t 
begins until its completion, if there exists a time slot t" £ [vt + 
1, t — 1] such that W{t") > 0, we have that 

(1) v di < ■ ■ ■ < v t < u t < ■ ■ ■ < u di ; 

(2) t" has never been fidly utilized since the execution of 
AllocateRLM(-); 

(3) the allocation ofTi at every time slot in [1, t— 1] has never 
been increased since the execution of AllocateRLM(-). 

Proof. By the definition of v t and the way that Allocate- 
RLM(-) decreases the allocation of 7j (lines 6-10), we have 
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< • • • < v t . Further, according to the stop condition of 
Routine(-) in line 9, we always have that Y%Z\ Vi (t) — W ( t ) 
and further conclude that v t < u t since the allocation at 
t' can not be decreased by lines 6-10 of AllocateRLM(-). 
The allocation at t" £ [vt + 1,7 — 1] is not decreased by 
AllocateRLM(-) at a ny m oment since v ( i, < ■ ■ ■ < v t < t" 
and t" < t. Lemma 3.7 (2) holds. By Lemma 3.7 (2), the t’ 


in the loop iteration for t is also not fully utilized in the 
previous loop iteration. Hence, u t < uj for a t £ [t + 1, df. 
Lemma |3.7| (1) holds. So far, AllocateRLM(-) only attempts 
to increase the allocation of Ti in \t, di\ and there is no 


operation to increase its allocation in [1, t — 1]. Lemma 3.7 (3) 
holds. □ 

We next show the existence of T in line 11 of Routine(-) 
in the case that there exists a time slot t' earlier than t such 
that W{t') > 0 and Vtif) > 0. In Allocate- A(i), we 

can conclude that W(t') > 0 upon co mplet ion of Fully- 
Utilize(-) since t! > ut > Vt and by Lemma 3.6 We also have 
that r^vS) > 0 upon completion of F ully -Utilize(-) by 
Lemma |T7| (3) and Ui(t') = ki by Lemma 3.3 yi(t’) is still 
ki in the loop iteration of AllocateRLM(-) for / since t' > v t . 
By Lemma [3T| (3) and Lemma |3.3| and Lemma 3.6 we also 
have W{t) = 0 currently from ViZ) > 0. In Allocate- 
B(z), one more function Fully-Allocate(z) is called and if such 
t' exists when Routine(-) is called in AllocateRLM(-), we 
have t < uj when A > 0 and AllocateRLM(-) takes an 


3.3.1 Characterizing Allocate-X(i) 

We first describe the properties presented by Allocate- 
X(z), where X equals A or B. Allocate-A(z) is presented in 
Algorithm [3] and will be used in the greedy algorithm in 
the next section. Since Allocate-A(z) share some properties 
with Allocate-B(z), we also show its properties here to avoid 
proving them repeatedly. Let A denote the set of the tasks 
that have been fully allocated so far excluding Ti. 

Lemma 3.8. Upon every completion of the allocation algorithm 
Allocate-A(i) or Allocate-B(i), the ivorkload W ( t ) at every time 
slot is not decreased in contrast to the one just before the execution 
of this allocation algorithm. In other words, ifWjt) > 0 upon its 
completion, W(t) > 0 just before its execution. 

Proof. We observe the resource allocation state on the whole. 
Fully-Utilize(z) never change the allocation of any Tj £ A at 


every time slot. To further prove Lemma 3.8 we only need 
to show that, in the subsequent execution of Allocate-A(z) 
or Allocate-B(z), the total allocation at every time slot t, is no 
less than the total allocation of A at t upon completion of 
Fully-Utilize(z), i.e., St,gA 

In Allocate-A(z), when AllocateRLM(z, yf) is dealing 
with a time slot t, it does not decrease the allocation at every 
time slot in [vd t 


1, d] by Lemma 


3.6 


effect on yift) by Lemma 3.5 We can come to the same 
conclusion in Allocate-B(i) that yi(t') = ki and W(t) =0 
using an additional Lemma |3.4| Finally, in both Allocate- 
A(i) and Allocate-B(z) we have the same observation as we 
have made in Fully-Allocate(i): W{t) — yi(t) > W{t’) — yi{t') 
and there must exist such a task T: that yi> ( t') < y^ ( t ); oth¬ 
erwise, we can not have that inequality. In the subsequent 
loop iterations of Routine(-), Wit) becomes greater than 0 
but W{t) < A < ki — z/i(f). We still have W(t) — t/j(f) = 
C - W(t) - yft) > W(t’) - ki = W{t') - yi\t'). Such T v 
can still be found. 

Difference of Routine(-) and GreedyRTL. The operations 
in Routine(-) are the same as the ones in the inner loop 
of AllocateRTL(z) in GreedyRTL 131 and the differences are 
the exit conditions of the loop. In AllocateRTL(z), one exit 
condition is that there is no unsaturated time slot t' earlier 
than t. In this case, although GreedyRTL can guarantee the 
optimal resource utilization in a particular time interval 


where Vt < Vd,■ The 
only operations of decreasing the workload at t € [ 1. z'/J 
come form lines 6-10 of AllocateRLM(-) and they only 
decrease and change the allocation of T^. The allocation 
of Tj £ A at every time slot in [l,zzdj is still the one 
upon completion of Fully-Utilize(z) since we always have 
f > Ut > z;,z, in Routine(-) by Lemma 3.7 (1). Hence, the 
final workload of A at t £ [1, Vdf\ is at least the same as the 
one upon completion of Fully-Utilize(z). The lemma holds 
in Allocate-A(z). 

In Allocate-B(z), we need to additionally consider a call 
to Fully-Allocate(z). By Lemma 3.4 the lemma holds upon 
completion of Fully- Alloc ate(z). Since yt{t) = ki for all t £ 
[w + 1, df by Lemma |3b) we have that the subsequent call 


according the state we identified in Section 3.1 and our 
analysis in |T7j , there inevitably exist unsaturated time slots 
that are not optimally utilized. In fact, GreedyRTL achieves 
a resource utilization of min{ , C A ^ +1 } 1 17 . 


Finally, we have the following conclusion: 

Theorem 3.1. A set of tasks S can be feasibly scheduled by 
LDF(S) on C machines if and only if the boundary condition 
holds. 

In other words, if LDF(<S) cannot produce a feasible 
schedule for a set of tasks S, then this set cannot be success¬ 
fully scheduled by any algorithm; so, LDF(<S) is optimal. 

3.3 Proofs 

In this section, we prove Theorem |3.1| together with some 
conclusions for the greedy algorithm in the next section. 


to AllocateRLM(-) will take no effect on the workload at 
t £ [w + l,dj] and the lemma holds in [u + 1, d,] upon 
completion of Allocate-A(z). Since 7 ) is allocated some more 
W ( t ) machines at uj in the loop iteration of Fully-Allocate(z) 
for oj, W ( t ) becomes 0 and will also be 0 upon completion 
of AllocateRLM(-) as we described in its executing process. 
Upon completion of Fully-Allocate(z), the allocation of 7’, 
in [1 ,ui — 1] is still the one done by Fully-Utilize(z) by the 
definition of uj and the total allocation of A at every time slot 
in [1, w — 1] is no less than the one upon completion of Fully- 
Utilize(z). Then, AllocateRLM(-) will change the allocation 
in [1, oj — 1] in the same way as it does in Allocate-A(z) and 
the lemma holds in [l,w — 1]. Hence, the lemma holds in 
Allocate-B(z). □ 

Lemma 3.9. Upon completion of Allocate-A(i) or Allocate-B(i), 
if there exists a time slot t" that satisfies: t" £ (here 

assume that Ti £ A m ) such that W(t") > 0 in Allocate-A(i), or 
t" £ [1, t) such that W(t") > 0 in Allocate-B(i), we have that 

(1) in the case that lerii < di — t" + 1, Y^t=i 1 Ui(f) = 0; 

(2) in the case that lerii > di — t" + 1, Ti is allocated ki 
machines at each time slot t £ [t”, df\; 
















(3) the total allocation Dj (t) for every Tj £ A in 

\t", dj] is still the same as the one just before the execution 
of Allocate-A(i) or Allocate-B(i). 

Proof. We observe the stop state of AllocateRLM(-). In the 
case that lent < di — t" + 1, if Ui(f) > 0' there exists 

a time slot t, such that yi(t) < ki. The loop of Routine(A, 
rju yf) for t would not stop with the current state by the 
condition given in Algorithm [5] (also described in Chapter [3] 
and Section |4~l) . Lemma [T9] (1 ) holds. In the case that len, > 
di — t" + 1, we have that ' Diff) > 0 and there is 

no time slot t such that y-ft) < ki ; otherwise, the loop of 
Routine(A, ij\, 772 ) for t would not stop with the current 
state. Lemma [3.9| (2) holds. 


Now, we prove Lemma 3.9 (3). In Allocate-A(i), we dis¬ 
cuss two cases on t" just before the execution of the loop 
iteration of AllocateRLM(i, r/f) at every t £ ]t" + 1. di]: 
W(t") = 0 and W[t") > 0. We will prove t" < t' in the 
loop of Routine(-) since the change of the allocation of A 
only occurs between t and // for the task Tj /. 

If there exists a certain loop iteration of AllocateRLM(-) 
at t £ [t"+l, di] such that W (f") = 0 initially but W(t") > 0 
upon completion of this loop iteration, this shows that there 
exist some operations that decrease the allocation at t". Such 
operations only occur in lines 6-10 of AllocateRLM(-) and 
we have that t" < Vt . Since v t < ut "+1 < • • • < uwe 
have that t" < t' in the loop iteration of Routine(-) for 
every t £ \t" + 1, dj] by Lemma 3.7 Here, we also have 
that AllocateRLM(-) will do nothing in its loop it erati on 
at t" since y-ft) = 0 then. Hence, Lemma [t9| (3) 

holds in this case. In the other case, W[t") > 0 just before 
the execution of every loop iteration of AllocateRLM(-) for 
every t £ \t" + 1, dj], and we have that t" < t'. Just before 
the execution of AllocateRLM(-), we also h ave that either 
Vi{t") = ki or 2 h{t) = 0 by Lemma 3.3 Then, the 


loop iteration of AllocateRLM(-) will take no effect on the 
allocation at t". Hence, Lemma [3.9| (3) also holds in this case. 

In Allocate-B(i), the additional function Fully-Allocate(i) 
will be called and we will discuss the positions of t" in 
[uj + 1, di]. If t" £ [w + 1, dj], AllocateRLM(-) will take no 
effect on the allocation at every time slot in \t", dj] and we 
have that W{t") > 0 upon completion of Fully-Allocate(i). 
By Lemma 3.4 and Lemma 3.3 we have either yi{t") = ki 

^t"-i 


or X]f=i Vi(f) = 0 upon completion of Fully-Utilize(i). In 
the latter case, the call to Fully-Allocate(i) will cannot take 
any effect on the allocation at every time slot in [1, dj] since 
12 = 0 there. In the former case, we always have t' > t" in 
the loop iteration of Fully-Allocate(i) for every t £ ]t" +1. dj] 
and Fully-Allocate(i) cannot change the allocation of any 
task at its loop iteration for t" . Hence, Lemma |3.9| (3) holds 
when t" £ [w +1, dj]. If t" — w, AllocateRLM(-) does not de¬ 
crease the allocation at oj. We also have that W ( t") > 0 upon 
completion of Fully-Allocate(f), and then either yi{t") = ki 
or Di{t) = 0- AllocateRLM(-) will also do nothing at 


3.9 


(3) holds when t" = u>. 


t (i.e., w) and Lemma 

If t" £ [l,w- 1], we first observe the effect of Fully- 
Allocate(i). In the case that W(t") = 0 upon completion 
of Fully-Allocate(i) but W{t") > 0 upon completion of 
AllocateRLM(-), we have that the allocation in [l,i"] have 
ever been decreased and there exists a time slot t' such that 


W(t') > 0 when AllocateRLM(-) is in its loop iteration at a 
certain t £ [1 ,w], where t < Ut < t! < u. By Lemma 3.6 


this Ut is also not fully utilized upon completion of Fully- 
Allocate(z). Further, by Lemma 3.4 upon completion of 
every loop iteration of Fully-Allocate(i) at t £ [l,dj], ut 
is not fully utilized and Lemma 3.9 (3) holds in this case. 
In the other case, W{t") > 0 upon completion of Fully- 
Allocate(z). By Lemma [3.3[ t" is not fully utilized upon every 
completion of its loop iteration at t £ [w, dj], W(t") > 0 
and the exchange of allocation of A only occurs in two 
time slots t and t' in Routine(-) that are in [t",dj]. Hence, 
Lemma |3.9| (3) holds in this case. Further, Lemma |3.9| (3) 
holds upon completion of Fully-Allocate(i). As we analyze 
in Allocate-A(z), Lemma |3.9|(3) still holds upon completion 
of AllocateRLM(-). □ 

Lemma 3.10. The time complexity of Allocate-X(-) is 0(n). 


Proof. The time complexity of Allocate-B(i) depends on 
Fully-Allocate(i) or AllocateRLM(-); the complexity of 
Allocate-A(z) depends on AllocateRLM(-). In the worst case, 
Fully-Allocate(i) and AllocateRLM(i) have the same time 
complexity from the execution of Routine(-) for every time 
slot t £ [1, dj]. In AllocateRLM(-) for every task Tj £ T, 
each loop iteration at t £ [1, dj] needs to seek the time slot 
t! and the task Tj/ at most I), times. The time complexities 
of respectively seeking t! and Tj/ are 0(T) and 0{n ); the 
maximum of these two complexities is max{d, n}. Since 
dj < d and Di < D, we have that both the time complexities 
of Allocate-A(i) and Allocate-B(z) are 0(TD max{d, n}). 
Since the parameters d and ki are finite in practice [ 141, we 
have OjdD max{d, n}) = Ofn). □ 


3.3.2 Optimality of LDF(S) 


Theorem 3.1 holds from Lemma 3.2 and Proposition 3.1 
that will be introduced in the following. Recall that A 
always denotes the set of tasks that have been fully allocated 
so far excluding Tj. 


Lemma 3.11. Let m £ [L] + . Suppose Tj £ Sl-tu+i Is about to 
be allocated. If we have the relation that W(l) > W(2) > ■ ■ ■ > 
W(jL-m+i) before the execution of Allocate-B(i), such relation 
on the available machines at each time slot still holds after the 
completion of Allocate-B(i). 


Proof. We observe the executing process of Allocate-B(z). 
Allocate-B(i) will call three functions Fully-Utilize(v'), Fully- 
Allocate(z) and AllocateRLML, 1). In every call to those 
functions, the time slots t will be considered from the 
deadline of Tj towards earlier time slots. During the exe¬ 
cution of Fully-Utilize(f), the allocation to Tj at t is yt (t) = 
rriinjfcj, D, — iVifyiWjt)}. Before time slots t. and 

t + 1 are considered, we have W(t) > W(t + 1). Then, 
after those two time slots are considered, we still have 

W(t) > W(t + 1 ). 

During the execution of Allcate-B(z), let t' always denote 
the current time slot such that W(t r ) > 0 and W{t' + 1) = 0 
if such time slot exists, t' is also unique when the relation 
on the available machines holds. By Lemma |3.3| if H > 0 
at the very beginning of the execution of Fully-Allocate(z), 
we have yt( 1) = = yi(t') = ki upon completion of 

Fully-Utilize(i) and the allocation of Tj at every time slot in 
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[1, t'] will not be changed by Fully-Allocate(i) since A = 0 
then. When Fully-Allocate(i) is considering a time slot t 
(t > tf), it will transfer partial allocation of TV at t to the 
time slot t'. If tf becomes fully utilized, tf — 1 becomes the 
current t' and Routine) •) will make time slots fully utilized 
one by one from t' towards earlier time slots. In addition, 
the time slot t will again become fully utilized by line 5 
of Fully-Allocate(i), i.e., W(t) = 0. The allocation at every 
time slot in [l,t' — 1] are still the one upon completion of 
Fully-Utilize(i) and the allocation at tf is never decreased. 
Hence, the relation on the available machines still holds 
upon completion of every loop iteration of Fully-Allocate(i) 
for t £ [1, di\. 

From the above, we have the following facts upon com¬ 
pletion of Fully-Allocate(i): (1) the allocation of 1) in [1, t] 
is still the one upon completion of Fully-Utilize(i); (2) the 
allocation of A in [l,t — 1] is still the one just before the 
execution of Allocate-B(z), and (3) the allocation of Hat t is 
not decreased in contrast to the one just before the execution 
of Allocate-B(z). Hence, we have that C — ffr c_,\ Vi (1) > 


C 2^(2) — ''' — C Er 46 ^»‘(* )' 

Upon completion of every loop iteration of 
AllocateRLM(-) at t £ [t' + 1, dj], u t = tf or tf + 1. 
When AllocateRLM(-) is considering t, the time slots from 
tf towards earlier time slots will become fully utilized in the 
same way as Fully-Allocate(i). Hence, the relation on the 
number of available machines holds obviously in [if + 1, di\ 
since every time slot there is fully utilized. Since Vt < ut 
by Lemma 3.7 the allocation of A in [1, vt\ has not been 
changed since the execution of AllocateRLM(-), and we still 
have that the relatoin on the number of available machines 
holds in [1, vf\ in both the case where v t < t' and the case 
where v t = t'. Here, if v t < tf , the allocation at if is never 
decreased by AllocateRLM(-) and further if if — 1 > v t + 1, 
the allocation at every t £ [v t + 1, tf — 1] has also not 
been changed so far by AllocateRLM(-). Hence, we can 
conclude that Lemma 3.11 holds upon completion of the 
loop iteration at tf + 1. Then, if tf = Vt<+ i, AllocateRLM(-) 
will not take an effect on the allocation in [l,i'] and the 
lemma holds naturally. Otherwise, tf > v #+i and the 
allocation of 7) in [v t '+i + 1 ,t'\ is still the one done by 
Fully-Utilize(i) and the allocation in [vt.'+i + 1, t'] is also 
not decreased in contrast to the one upon compl etion of 
Fully-Utilize(z) (i.e., W(t) > 0 then). By Lemma 


3.3 


we 


conclude that AllocateRLM(-) will not take an effect on the 
allocation in [1, £'] and the lemma holds upon completion 
of AllocateRLM( •) ■ □ 


Lemma 3.12. Upon completion of Allocate-B(i) for a task Ti £ 
SL—m+i (tti £ [L] + ), we have that 

(1) Let ti,i 2 e [l,Ti_ m+ i] and ti < t 2 ; then, W(t\) > 
W(t 2 ); 

(2) Ti is fully allocated; 

(3) A^(Al U {Ti}) resources in [• tl-j + 1 ,tl\ have been 
allocated to A U {Ti}, j £ [L} + ). 


Proof. By induction. Initially, A = 0. When the first task 
Ti = Tl .i in Sl is being allocated, Lemma 3.12 (1) holds by 
Lemma 3.11 Since Algorithm [ 2 ] will allocate min {A:,. I), — 
Y^t'=t+ 1 J/»(^)> W(t)} machines to Ti from its deadline to¬ 
wards earlier time slots, and the single task can be fully 


allocated definitely, the lemma holds. We assume that when 
the first l tasks in Sl have been fully allocated, this lemma 
holds. 

Assume that this lemma holds just before the execution 
of Allocate-B(i) for a task 7j £ Sr-m+i. We now show 
that this lemma also holds upon completion of Allocate- 
B(i). By Lemma |3.11| Lemma |3.12|(1) holds upon comple¬ 
tion of Allocate-B(i). Allocate-B(i) makes no change to the 
allocation of A in [rL - m +1 + 1 ,tl] due to the deadline di 
and Lemma 3.12 (3) holds in the case that j £ [m — 1] + 


by the assumption. Here, if m = 1, the conclusion above 
holds trivially. Let tf always denote the current time slot 
such that W(t') > 0 and W(t' + 1) = 0 if such time 
slot exists. If such time slot does not exist upon com¬ 
pletion of Allocate-B(i), 7) has been fully allocated since 
S satisfies the boundary condition. Now, we discuss the 
case that W{1) > 0 upon completion of Allocate-B(i). By 
Lemma |3.9j we know that Ti has also been fully allocated, 
and Lemma|3.12|(2) holds upon completion of Allocate-B(v). 
Assume th at t' £ ( tl~v + I^tl-v+i]. By the definition of 
t!, Lemma 3.12| (3) holds in the case that m < j < V — 1 
obviously. By Lemma |3.9| T, has already optimally utilized 
the resource in [tl-j + 1, T/J for all V < j < L and so 
has the set A together with the assumption. Lemma [3. 12 (3) 
holds. □ 


Proposition 3.1. The boundary condition is sufficient for 
LDF(5) to produce a feasible schedule for a set of malleable tasks 
with deadlines S; the time complexity of LDP(S) is 0{n 2 ). 


Proof. By the second point in lemma 3.12 and Lemma [3. 10 


the proposition holds when all the tasks in S have been 
considered in the algorithm LDF(<S). □ 


4 Applications 

In this section, we illustrate the applications of the results 
in Section [3] to (i) two algorithmic design techniques for 
the social welfare maximization objective in [2], |(3j, giving 
the optimal greedy algorithm and the first exact dynamic 
programming algorithm and (ii) two other objectives. 

4.1 Greedy Algorithm 

Greedy algorithms are often the first algorithms one 
considers for many optimization problems. 

4.1.1 Performance Bound 

In terms of the maximization problem, the general form of 
a greedy algorithm is as follows |T8) , (T9) : it tries to build a 
solution by iteratively executing the following steps until 
no item remains to be considered in a set of items: (1) 
selection standard: in a greedy way, choose and consider an 
item that is locally optimal according to a simple criterion 
at the current stage; (2) feasibility condition: for the item 
being considered, accept it if it satisfies a certain condition 
such that this item constitutes a feasible solution together 
with the tasks that have been accepted so far under the 
constraints of this problem, and reject it otherwise. Here, 
an item that has been considered and rejected will never 
be considered again. The selection criterion is related to the 
objective function and constraints, and is usually the ratio of 
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'advantage' to 'cost', measuring the efficiency of an item. In 
the problem of this paper, the constraint comes from the 
capacity to hold the chosen tasks and the objective is to 
maximize the social welfare; therefore, the selection criterion 
here is the ratio of the value of a task to its demand. 

Given the general form of greedy algorithm, we define a 
class GREEDY of algorithms that operate as follows: 

1) considers the tasks in the non-increasing order of 
the marginal value; 

2) let A denote the set of the tasks that have been 
accepted so far, and, for a task X* being considered, 
it is accepted and fully allocated iff there exists a 
feasible schedule for A U {Xj}. 


In the following, we refer to the generic algorithm in 
GREEDY as Greedy. 


Proposition 4.1. The best performance guarantee that a greedy 
algorithm in GREEDY can achieve is I. 


Proof. Let us consider a special instance: (i) let T>i = {Tj £ 
T\dj = d!f\, where i £ [2] + , d 2 and d\ £ Z + , and d' 2 > d [; 

(ii) for all Tj £ T>i, vl- = 1 + e, Dj = 1, kj = 1, and, there is 
a total of C ■ d[ such tasks, where e £ (0,1) is small enough; 

(iii) for all Tj £ V 2 , v'j = 1, kj = 1 and Dj = d' 2 — d^ + 1. 
GREEDY will always fully allocate resource to the tasks in 
D\, with all the tasks in T> 2 rejected to be allocated any 
resource. The performance guarantee of GREEDY will be no 


more than 


_ Cjd[ _ 

C-[(l+e)(d' 1 — l)+l-(d , 2 —d / 1 +l)\ 


. Further, with e —> 0, 
this performance guarantee approaches . In this instance, 

s = d ‘f, ,, and . When d 2 —> + 00 , = £ —^. 

Hence, the proposition holds. □ 


4.1.2 A New Algorithmic Analysis 

We show that as soon as the resource allocation of tasks 
(accepted to be completed by Greedy) satisfies two features 
we define here, its performance guarantee can be deduced 
immediately. 

To describe the resource allocation process of a greedy 
algorithm, we define the sets of consecutive accepted (i.e., 
fully allocated) and rejected tasks Ai, TZ±, A 2 , ■ ■ ■. Specifi¬ 
cally, let A rn = {T im ,Tj m+ 1 ,--- ,Tj m _i} be the m-th set 
of all the adjacent tasks that are fully allocated after the 
task Tj m l , where Tj m is the first rejected task following 
the set Am- Correspondingly, lZ m = {Tj m ,--- ,X im+1 _i} 
is the m-th set of all the adjacent rejected tasks following 
the set A m , where m £ \K} + for some integer AT and 
*1=1. Integer I\ represents the last step: in the A'-th step, 
A 1 , / 0 and TZk can be empty or non-empty. We also define 

Cm = rna,XT i e'R. 1 U---UTZ m {di} and c m = max T;G.4iU---U^l m 

{di}. In the following, we refer to this generic greedy 
algorithm as Greedy. While the tasks in A m U lZ m are being 
considered, we refer to Greedy as being in the m-th phase. 
Before the execution of Greedy, we refer to it as being in 
the 0-th phase. In the m-th phase, upon completion of the 
resource allocation to a task T £ A m U !Z m , we define 
zE'jE = Y^StLt! Z/* M to describe the current total allocation 
to Ti over [ti, * 2 ]- After the completion of Greedy, we also 
define D= Et=t, Vi(t) t0 describe the final total 
allocation to X) over [t -\, t 2 }. We further define T^+fl as an 


imaginary task with characteristics T^jf+yii 

ki}, where v [ ^\ = v z D [ ^l\/D u d [ ^\ = min{f 2 ,dj. 


Features and theorem. Upon completion of the m-th phase 
of Greedy, we define the threshold parameter tf n as follows. 
If c m > c' m , then set = c m - If c m < c' m , then set t* 
to a certain time slot in [c m ,c'm]- emphasize here that 
di < fm f° r all Ti € GffflZj hence the allocation to the 
tasks of GffflZi in ffff + l,d] is ineffective and yields no 
value due to the constraint from the deadline. For ease of 
exposition, we let fg = 0 and t^ +1 = d. With this notation, 
we define the following two features that we will want the 
resource allocation to satisfy for all m £ [K] + : 


Feature 4.1. The resource utilization achieved by the set of tasks 
U fLiAj in [1, t%] is at least r, i.e., Et.gu- ,A 3 D k+i\/ ( C ' 

C) > r. 

Viewing TfAf as a real task with the same allocation 
as that of X) in [l,tm+i\ done by Greedy, we define the 
second feature as: 

Feature 4.2. + l,t* +1 ] is optimally utilized by {X^™+^ 

/; c ->r i-d .i. 

Next, we have the following conclusion. 


Theorem 4.1. If Greedy achieves a resource allocation struc¬ 
ture that satisfies Feature \4.1\ and Feature 4.2 it gives an r- 
approximation to the optimal social welfare. 


In the rest, we prove Theorem 4.1 For ease of the 


subsequent exposition, we add a dummy time slot 0 but the 
task Ti £ T can not get any resource there, that is, j/j(0) = 0 
forever. We also let Ao = IZq = Ak+i = Hk +1 = 0- 


Scheduling tasks with relaxed constraints. Our analysis 

[1 t th 1 

is retrospective. We will here treat Tx+ip. as a real task, 
and consider the welfare maximization problem through 
scheduling the following tasks: 


(1) r £’ c+i] = {T [ K+if \ T * e u r=o a, }, 

(2) Tm+1 = {Tl'Zif | Ti £ uf+i +1 Aj}, 

(3) U = Gf+ 0 1 n r 


Here, m £ [A'] and we relax several restrictions on the tasks. 
Specifically, partial execution can yield linearly proportional 
value, that is, if X t is allocated Ef=i Vi(t) < T>i resources 

by its deadline, a value will be added to the 

social welfare. The parallelism bound ki of the tasks X) in 

Xm+i U J\f is reset to be C. 

_ [1 t th I 

Let 7^„ + i = 7m ’ m+1 U Tm +1 U A f. Denote by 

OPT^ 1,tm +d and OPT^ m+1,tm +d the optimal social welfare 
by scheduling T m +i in the segmented timescales [l,tm+i] 
and [t% + 1 Am+i}- The connection of the problem above 
with our original problem is that, with the above relaxation 
of some constraints of tasks, OPT^ 1,tK +d is an upper bound 
of the optimal social welfare in our original problem. In 
the following, we will bound OPT ^ 1,tm +d by bounding 
OPXl 1 ’*™! and OPX^ m+1,t ’ T *+ 1 ^, m £ [K] + . In the special 
case where t/f; = t% +1 , OPT P'Cl > OPT^+d since 

7m+l = Tm, Tm-T ' C Tm’ tm+1 \ and the tasks in P m +i are 
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more relaxed than +' ’'' m+1 ■ hence we can directly bound 
OPT! 1,4 ™! in order to bound OPT^ 1 ' t <^+ 1 \ We therefore 
assume that tfr +1 > tfr subsequently. The lemma below 
shows that the bound on OPT ^’ tK +^ can be obtained 


through bounding each OPT 




4 (to £ [A']): 


Lemma 4.1. OPT [1 ++i] < OPT 1L+ + OPT l + +1 ’4 +1 ] 
(to £ [K]+). 

Proof. Consider an optimal schedule achieving OPT^ 0,tm + *1. 

r-t +th ] n 4.th i 


If a task T. 


than D 


[i,t* 


JC+l,i 


£ 7+ 


U p 


m+1 


K'+l,* 


.[4+1,4+ 


Feature [42] the lemma holds. 


For ease of exposition, let fFfr +1 = {+++ |Tj £ 


.,[1,4* 


Proof. With the constraint of deadlines, the allocation of A(“ 
in [tfr + 1,++] yields no value due to di < tfr. As a 
result of the order that Greedy considers tasks, the tasks 

of 7m ’ m+ will have high er m arginal values than the tasks 
in *^rn U J~ m+1- By Feature 


4.2 


the lemma holds. 


□ 


is allocated more 

j.th 1 


K+l i resources in the time slot interval [l,i+ we 


transfer the part larger than P+’+l (at most 
to [tfr + 1 Pm+i] an d Lr ^e meantime transfer to [1,+] 
partial allocation of Af in [tfr + l,f„ + i] if the allocation 
transfer in the former needs to occupy the allocation of Af. 

[l,t ±h ] 

The former is feasible in that the total allocation of T, ’ m+1 


We now bound OP 2 " , [ 1 > t m+ (m £ [AT — 1]) using the 
allocation of 7+1 specified above. 

When there are C machines, the total resource available 
in the time slot interval [1, d] can be viewed as an area with a 
length d and a width C. With abuse of notation and only in 
the rest of this proof, whenever A denotes a set of allocated 
tasks, we use the capital A to denote both the area occupied 
by the allocation of the tasks of A in some interval and the 
size of this area. 


,C K fromT l K'l f 


K+l,i 

in [tfr + 1, tfr +1 ] can be up to D+-, f m+1 without violating 
the constraints of the deadline and parallelism bound of 

T. 


as Greedy has done before; the latter is feasible 
since the parallelism bound there is C. Then, this optimal 
schedule can be transformed into a feasible schedule of T m 
in [1, tfr\ plus a feasible schedule of 7+i in [tfr + 1, t+J. 
The lemma holds. □ 

Bounding time slot intervals. Now, we consider the fol¬ 
lowing schedule of 7+1 (to £ [AT]). Whenever 7 +1 is 
concerned, the allocation to the tasks of 7+i at every time 
slot t £ [1 , i+J is t£Ji .j = Uiit), as is done by Greedy in 
[1, f+i] for the set of tasks T. Note that the tasks in Af are 
all rejected. We will study how to bound OPt 4 +1 ’++ 
using the above allocation of 7+i. 

We first observe, in the next two lemmas, what schedule 
can achieve OPT^ tm+1 ’ tm + 1 1. 


Lemma 4.4. There exist K areas (71+2, 

U Ffr such that 

(1) the size of each C m+ i is r-C■ (++ — tfr) ( m G [AT— 1]); 

[1 t ] 

(2) every C m +i is obtained from the area Tfr’ m+1 U Ffr +1 — 
SjLo Cj, where Co = 0, and is the part of that area with 
the maximum marginal value. 


Proof. When to = 0, by Feature 


4.1 


E 


Tie A 


D 


[Mi 


K+l,i - 


> r-C- 


tfr and the lemma holds. Assume that when to < l (l > 0), 
the lemma holds. N ow, we prove the lemma holds when 

-lhA+2] I | T 7 >+ \ J.tk 


4.1 


l~\~ 1 


m, = 1 + 1. By Feature 

assumption, Ej=o + = r-C++ 
IM+l 


1 U+ > r-C-t 


and U'to C. 


1 + 2-By ^ 

rplTtild 
j — 1 1 


Ffr ± C r/+ ,+2) , the lemma holds for m = l + 1. 


J U 

□ 


We emphasize here that the tasks in A\ U 77 1 U + U 
• • • U 77 k have been considered and sequenced in the non¬ 
increasing order of the marginal value. Recall the compo¬ 


sition of 7n 


hC+i] 


and the allocation to 77 


{i,t„ 


m— 1 


u Ffr 


in 


Lemma 4.2. OPT4++4+ = ^eufr^ «k+ i,i • 

Proof. When to = K, di < tfr for all Tj £ A f and F m +i = 0. 
The allocation of AC in [Cfr + l,i^ +1 ] yields no value. By 


fact corresponds to the allocation of 77+ m+1J in [1, tfr]. Let 
Vfr +1 be the total value associated with the area C m+ \. Since 

ri 1 l? 1 uP+-Er=oQ > 0andri 1 5 1 up+ c ii 1,e+l] ,b y 


the way that we obtain C m +1 
schedule that achieves OPp4+ 1 ’ t m+ 1 


in Lemma 4.4 and the opti mal 


in Lemma 


4.3 


we 


□ have that 


41,4* 

m+1 X^K+l,! 

++, and ++ = {T^ 41 + e uf+V+j- We 


vL 


> 


OPT 1 




L iC+l,i I 1 ^ “ 7 =m+ 2 ^VJ 
also let Af~ = U+7+ and Affr = Uyfrfr +1 lZj. Here, 
+ 1+1 = ■7'm +1 U P m+1 and Af = Af m U Affr. 

Lemma 4.3. With the relaxed constraints of tasks, we have for 
all to £ [A' - 1] that OPT+ +1 + +1 ] can be achieved by the 
following scliedide: 

(1) D+ li ’ m+1 resources are allocated to every task 
-[M+1. 


The conclusion above in fact shows that the average 
marginal value of C rn+ \ is no less than the one in an optimal 


schedule. Finally, we have that OPT7+ 1 ’ t m+ i ] < 
all to £ [AT — 1], 


VI. 


for 


By Lemma 4.1 OPT I 1,4 * J < E/=i + / r - Lemma 


± K+\,i Ln 

(2) For the unused resources in frfr + 1++], we execute 
the folloiving loop until there is no available resources 
or P+i U A(+ is empty: select a task in P+i U Af.fr 
with the maximum marginal value and allocate as many 
resources as possible with the constraint of deadline; delete 
this task from fF m +1 UAA+. 


we further have that 

K 

OPT [1 ’ t] < V-/t + OPT^+LH 
7=1 

[4+i,+ 1 ] 


4.2 


s(Eh+ E 

7=1 Ti£uf =1 Aj 

< Y Vi / r - 


u K+l,i 


)/r 


Hence, Theorem |4.1|holds. 
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Algorithm 8: GreedyRLM 

Input : n tasks with typei = {zy, dj, D i: kj} 
Output: A feasible allocation of resources to tasks 


1 initialize: yi(t) <— 0 for all Ti gT and 1 < t < d, 

m = 0 , = 0 ; 

2 sort tasks in the non-increasing order of the marginal 
values: v( > v' 2 > ■ ■ ■ > v' n ; 

3 i <— 1; 

4 while i < n do 

5 if J2t<di min{VF(£), ki} > A then 

6 Allocate-A(z); // in the (m + l)-th phase 


7 

8 
9 


10 

11 


else 

if Tj_i has ever been accepted then 

mi — 771+1^ // in the m-th phase, the 

allocation to Am was completed; the 
first rejected task is Tj m = Ti 

while J2t<d i+1 min{W(£),fci+i} < A+i do 
i i — 2 + 1; 


12 

13 

14 

15 


/* the last rejected task is Tj m _i = Tj 
and lZ m = {Tj m , ■ ■ ■ */ 

if c m > c' m then 

I 7 th 4— r ■ 

“ c m/ 

else 

set t* to time slot just before the first time 
slot t with W(t) >0 after c m or to c' m if 
there is no time slot t with W(t) > 0 in 

[Cmj c m\' 


16 


i i — i 1; 


4.1.3 Optimal Algorithm Design 

We now introduce the executing process of the optimal 
greedy algorithm GreedyRLM, presented as Algorithm [8] 

(1) considers the tasks in the non-increasing order of the 
marginal value. 

(2) in the m-th phase, for a task T t being con¬ 
sidered, if it satisfies the allocation condition 
J2t<d,i m in{W(f), ki} > Di, call Allocate-A(i) to 
make Ti fully allocated. The executing process of 
Allocate-A(z) and its correctness have been explained 
in Section [T2l 

(3) if the allocation condition is not satisfied, set the 
threshold parameter tf n of the m-th phase in the way 
defined by lines 8-15 of Algorithm [8] 

Proposition 4.2. GreedyRLM gives an —--approximation to 
the optimal social welfare with a time complexity of 0(n 2 ). 


By Lemma 3.10 the time complexity of GreedyRLM is 


0(n 2 ). In the rest, we prove Proposition 


4.2 


With Theo- 
42 


rem |4.1| we only need to prove that Features |4.1| and 
holds in GreedyRLM where r = § fr, which is given in 
Propositions |4.3| and |4.4| 


Proposition 4.3. Upon completion of GreedyRLM, Feature 4.1 
holds in which r = 


Proof. Upon completion of the m-th phase of Greedy-RLM, 
consider a task Ti G UjL.flZj such that di = c m . Since 


Ti is not accepted when being considered, it means that 
Di < J2t<d m i n {^i) W(t)} at that time and there are at 
most lerii — 1 = — 1 time slots t with W(t) > ki in 

[1, c m }. We assume that the number of the current time slots 
t with W ( t ) > ki is /i. Since Tj cannot be fully allocated, we 
have the current resource utilization in [1, c m ] is at least 

C -di- pC - (Di - pki) 

C-di 

> Cd l - Di - (lerii - 1 )(C - kf) 

C-di 

^ C(di lerii ) 4 (C ki ) -t- (leriiki Df} 

~ C+ 


s 

We assume that Ti £ IZh for some h £ [m] + . Allocate- 
A(j) consists of two functions: Fully-Utilize(j) and Allocate- 
RLM(j, 0). Fully-Utilize(j) will not change the allocation 
to the previous accepted tasks at every time slot. In 
AllocateRLM(j, 0), the operations of changing the allocation 
to other tasks happen in its call to Routine(A, 0, 1). Due 
to the function of lines 9-11 of Routine(A, 0, 1), after the 
completion of the /i-th phase of GreedyRLM, the subsequent 
call to Allocate-A(j) will never change the current allocation 
of Uj —f j in. [l) Hence, if t/pn, — r the lemms holds 
with regard to u- Aj (m > hf, if tm > c m , since each time 
slot in [c m + 1, iff ] is fully utilized, the resource utilization 
in [c m +1, tm] is 1 and the final resource utilization will also 
be at least r. □ 


Lemma 4.5. Due to the function of the threshold an d its 
definition, we have for all Ti £ A m that 


(1) [tj + 1, d] is optimally utilized by Ti upon completion of 


(2) D 


Allocate-A(i), where m < j < K; 


= DI 


'+LL+1] 


K+l,i 

Proof. The time slots f^ +1, • • • ,t^ +1 are not fully utilized 
upon completion of Alloca te-A (i) by Lemma |3.8| and the 
definition of t 


th 


3.9 


„ l . By Lemma 

the time slots in [tj h + 1, d], namely, either 


(1) and (2), Ti fully utilizes 

Vi(t) 


t=tf +1 

or yi(t) = ki for all t £ [tj h + 1 ,df upon completion of 
Allocate-A(z). 

In the subsequent execution of Allocate- A(l) for 7/ £ A/,, 
where m < h < K, due to the fact that tff + L ■ ■ • , Tjf + 
1 are not fully utilized upon completion of Allocate-A(Z), 
the total allocation of Ti in [TJ 1 + 1, d] keeps invariant by 


Lemma 3.9 (3) for h < j < K. Due to the function of the 
threshold parameter in the 7-th phase of GreedyRLM 
(line 9 of Routine(-)), when AllocateRLM(z, 0) is dealing with 
a time slot t £ [tjf_ + 1, d,], the allocation change of other 
tasks can only occur in [t i jf_ 1 +1, t]. Hence, we have that the 
total allocation of Tj in [t\f _ : + 1, d] keeps invariant and the 
allocation of Ti at every time slot in [1, keeps invariant. 
We can therefore conclude that the total allocation of Tj in 
each [tf + 1, tf +1 ] keeps invariant upon every completion 


U+i 

of Allocate-A(/). Hence the lemma holds. 


□ 


Proposition 4.4. [t^ + is optimally utilized by 

{+°5rV* e U™ 0 A}. 
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Proof. We observe the allocation of a task T) £ Uff 0 Aj after 


the completion of GreedyRLM. If lent < di — x , we have 
that = 0. If di — < lerii < di — t we have 

that = Di - ki(di - C +1 ) and D 1 ^’^ = 

^k’+ 1 ^ di —tm < lerii, we have that Z^'+Tt^ = Di ~ 


fc iK-C+t) and -DifTi 

lemma holds by Lemma 


[‘m+Mm + ll 


3.1 


= - O- Hence ^ 

□ 


4.2 Dynamic Programming 

For any solution, there must exist a feasible schedule for 
the tasks selected to be fully allocated by this solution. So, 
the set of tasks in an optimal solution satisfies the boundary 
condition by Lemma 3.2 Then, to find the optimal solution, 
we only need address the following problem: if we are given 
C machines, how can we choose a subset S of tasks in 
Vi U • • • U T>l such that (i) this subset satisfies the boundary 
condition, and (ii) no other subset of selected tasks achieve 
a better social welfare? This problem can be solved via 
dynamic programming (DP). To propose a DP algorithm, 
we need to identify a dominant condition for the model of 
this paper (13) . Let J C T and we define a L-dimensional 
vector 


H(D) = (Af (F) - \$(F ),• • • , \ C L {F) - A^T)), 

where A^(J') - \{{ l _ 1 (F), m £ [T] + , denotes the optimal 
resource that F can utilize on C machines in the segmented 
timescale [T£_ m + 1, T£_ m+ i] after F has utilized \ < f l _ 1 {F) 
resource in [ TL- m +i + 1 ,tl\. Let v{F) denote the total 
value of the tasks in F and then we introduce the notion 
of one pair (J 7 , v(F)) dominating another (F',v(F')) if 
H(F) = HjF') and v{F) > v{F'), that is, the solution to 
our problem indicated by (J 7 , v(F)) uses the same amount 
of resources as (F 1 , v(F')), but obtains at least as much 
value. 

We now give the general DP procedure DP(T) fL3| . Here, 
we iteratively construct the lists A(j) for all j £ [rip. Each 
A{j) is a list of pairs (J 7 , v(F)), in which J 7 is a subset 
of {Ti, X 2 , ■ • • , Tj } satisfying the boundary condition and 
v(F) is the total value of the tasks in F. Each list only 
maintains all the dominant pairs. Specifically, we start with 
A(l) = {(0,0), ({Tf}, fi)}- For each j = 2, • ■ • ,n, we first 
set A(j) <— A(j — 1), and for each (J 7 , v(F)) £ A(j — 1), 
we add (J 7 U {Tj}, v(F U {I}})) to the list A{j) if F U {Tj} 
satisfies the boundary condition. We finally remove from 
A(j) all the dominated pairs. DP(T) will select a subset S of 
T from all pairs (J 7 , v(F)) £ A[n) so that v( J 7 ) is maximum. 

Proposition 4.5. DP(T) outputs a subset S of T = {Ti, 
• • • ,T n } such that v(S) is the maximum value subject to the 
condition that S satisfies the boundary condition; the time com¬ 
plexity ofDP(T) is 0(nd L C L ). 


such that H(F) = H(F') and v(F) > v(F'). First, suppose 
that Tj f T'. Then, the claim follows by the induction 
hypothesis and by the fact that we initially set A(j) to 
A(j — 1) and removed dominated pairs. Now suppose 
that Tj £ J 7 ' and let F[ = F' — {Tj}. By the induction 
hypothesis there is some (Ti, v(Fi)) £ A(j — 1) that dom¬ 
inates {F [, v{F[)). Then, the algorithm will add the pair 
(Ti U {Tj}, v(F 1 U {T,})) to A(j). Thus, there will be some 
pair (F, v(F)) £ A{j) that dominates (F', v(F')). Since the 
size of the space of H(F) is no more than (C • T) L , the time 
complexity of DP(T) is nd L C L . □ 

Proposition 4.6. Given the subset S output by DP(T), LDF(S) 
gives an optimal solution to the welfare maximization problem 
with a time complexity Ojmax{nd L C L , n 2 }). 

Proof. It follows from Propositions |4.5| and |3.1| □ 

Remark. As in the knapsack problem (13) , to construct the 
algorithm DP(T), the pairs of the possible state of resource 
utilization and the corresponding best social welfare have to 
be maintained and a L-dimensional vector has to be defined 
to indicate the resource utilization state. This seems to imply 
that we cannot make the time complexity of a DP algorithm 
polynomial in L. 

4.3 Machine Minimization 

Given a set of tasks T, the minimal number of machines 
needed to produce a feasible schedule of T is exactly the 
minimum C* such that the boundary condition is satisfied. 
An upper bound of the minimum C* is k • n and this 
minimum C* can be obtained through binary search with a 
time complexity of log (fc • n); hence, we have the following 
conclusion from Proposition [3T] and Lemma |3.2| 

Proposition 4.7. There exists an exact algorithm for the machine 
minimization objective with a time complexity of 0{n 2 ). 

4.4 Minimizing Maximum Weighted Completion Time 

Under the task model of this paper and for the objective 
of minimizing the maximum weighted completion time 
of tasks, a direction application of LDF(<S) improves the 
algorithm in 1101 by a factor 2. 

In 1101, with a polynomial time complexity, Nagarajan 
et al. find a completion time dj for each task Tj that is 
1 + e times the optimal in terms of the objective here; then 
they propose a scheduling algorithm where each task can 
be completed by the time at most 2 times dj. Hence, an 
(2 + 2e)-approximation algorithm is obtained. Hence, using 
the optimal scheduling algorithm LDF(<S), we have that 

Proposition 4.8. There is an (1 + e)-approximation algorithm 
for schediding independent malleable tasks under the objective of 
minimizing the maximum iveighted completion time of tasks. 


Proof. The proof is similar to the one in the knapsack 
problem (131. By induction, we need to prove that A(j) 
contains all the non-dominated pairs corresponding to fea¬ 
sible sets F £ {Ti, • ■ ■ , Tj}. When j = 1, the proposition 
holds obviously. Now suppose it hold for A(j — 1). Let 
F 1 C {Ti, ■ • • , Tj } and H{F') satisfies the boundary con¬ 
dition. We claim that there is some pair {F,v{F)) £ A(j) 


5 Conclusion 

In this paper, we study the problem of scheduling n 
deadline-sensitive malleable batch tasks on C identical ma¬ 
chines. Our core result is a new theory to give the first 
optimal scheduling algorithm so that C machines can be 
optimally utilized by a set of batch tasks. We further derive 
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three algorithmic results in obvious or non-obvious ways: (i) 
the best possible greedy algorithm for social welfare maxi¬ 
mization with a polynomial time complexity of 0(n 2 ) that 
achieves an approximation ratio of s “ 1 , (ii) the first dynamic 
programming algorithm for social welfare maximization 
with a polynomial time complexity of 0(vnax-{nd L C L , n 2 }), 
(iii) the first exact algorithm for machine minimization 
with a polynomial time complexity of 0(n 2 ), and (iv) an 
improved polynomial time approximation algorithm for the 
objective of minimizing the maximum weighted completion 
time of tasks, reducing the previous approximation ratio by 
a factor 2. Here, L and cl are the number of deadlines and 
the maximum deadline of tasks. 
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